{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "CRF.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fo3mvewZFRAt",
        "colab_type": "text"
      },
      "source": [
        "# CRF, Conditional Random Field"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nXwJKCfSFcnL",
        "colab_type": "text"
      },
      "source": [
        "1．概率无向图模型是由无向图表示的联合概率分布。无向图上的结点之间的连接关系表示了联合分布的随机变量集合之间的条件独立性，即马尔可夫性。因此，概率无向图模型也称为马尔可夫随机场。\n",
        "\n",
        "概率无向图模型或马尔可夫随机场的联合概率分布可以分解为无向图最大团上的正值函数的乘积的形式。\n",
        "\n",
        "2．条件随机场是给定输入随机变量X条件下，输出随机变量Y的条件概率分布模型， 其形式为参数化的对数线性模型。条件随机场的最大特点是假设输出变量之间的联合概率分布构成概率无向图模型，即马尔可夫随机场。条件随机场是判别模型。\n",
        "\n",
        "3．线性链条件随机场是定义在观测序列与标记序列上的条件随机场。线性链条件随机场一般表示为给定观测序列条件下的标记序列的条件概率分布，由参数化的对数线性模型表示。模型包含特征及相应的权值，特征是定义在线性链的边与结点上的。线性链条件随机场的数学表达式是  \n",
        "\n",
        "$P(y|x)=\\frac{1}{Z(x)}exp(\\sum_{i,k} \\lambda_{k}t_{k}(y_{i-1}, y_{i}, x, i) + \\sum_{i,l}\\mu_{l}S_{l}(y_{i}, x, i))$  \n",
        "\n",
        "其中，  \n",
        "\n",
        "$Z(x)=\\sum_{y}exp(\\sum_{i,k}\\lambda_{k}t_{k}(y_{i-1}, y_{i}, x, i) + \\sum_{i,l}\\mu_{l}S_{l}(y_{i}, x, i))$\n",
        "\n",
        "\n",
        "4．线性链条件随机场的概率计算通常利用前向-后向算法。\n",
        "\n",
        "5．条件随机场的学习方法通常是极大似然估计方法或正则化的极大似然估计，即在给定训练数据下，通过极大化训练数据的对数似然函数以估计模型参数。具体的算法有改进的迭代尺度算法、梯度下降法、拟牛顿法等。\n",
        "\n",
        "6．线性链条件随机场的一个重要应用是标注。维特比算法是给定观测序列求条件概率最大的标记序列的方法。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SL_XKnt7H4Q2",
        "colab_type": "text"
      },
      "source": [
        "#### 例 11.1"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7kx_vjWOFGSd",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import numpy as np"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RXUzhi7VHy2B",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "outputId": "19c4e0bf-f4e7-4533-8d25-d8b2a5822d31"
      },
      "source": [
        "#这里定义T为转移矩阵列代表前一个y(ij)代表由状态i转到状态j的概率,Tx矩阵x对应于时间序列\n",
        "#这里将书上的转移特征转换为如下以时间轴为区别的三个多维列表，维度为输出的维度\n",
        "T1 = [[0.6, 1], [1, 0]]\n",
        "T2 = [[0, 1], [1, 0.2]]\n",
        "#将书上的状态特征同样转换成列表,第一个是为y1的未规划概率，第二个为y2的未规划概率\n",
        "S0 = [1, 0.5]\n",
        "S1 = [0.8, 0.5]\n",
        "S2 = [0.8, 0.5]\n",
        "Y = [1, 2, 2]  #即书上例一需要计算的非规划条件概率的标记序列\n",
        "Y = np.array(Y) - 1  #这里为了将数与索引相对应即从零开始\n",
        "P = np.exp(S0[Y[0]])\n",
        "for i in range(1, len(Y)):\n",
        "    P *= np.exp((eval('S%d' % i)[Y[i]]) + eval('T%d' % i)[Y[i - 1]][Y[i]])\n",
        "print(P)\n",
        "print(np.exp(3.2))"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "24.532530197109345\n",
            "24.532530197109352\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7lUipRWFMVB7",
        "colab_type": "text"
      },
      "source": [
        "#### 例 11.2"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TG6SEYCbMXty",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "outputId": "b54d58b8-60b1-4dac-b03e-8816458c5ae2"
      },
      "source": [
        "#这里根据例11.2的启发整合为一个矩阵\n",
        "F0 = S0\n",
        "F1 = T1 + np.array(S1 * len(T1)).reshape(np.asarray(T1).shape)\n",
        "F2 = T2 + np.array(S2 * len(T2)).reshape(np.asarray(T2).shape)\n",
        "Y = [1, 2, 2]  #即书上例一需要计算的非规划条件概率的标记序列\n",
        "Y = np.array(Y) - 1\n",
        "\n",
        "P = np.exp(F0[Y[0]])\n",
        "Sum = P\n",
        "for i in range(1, len(Y)):\n",
        "    PIter = np.exp((eval('F%d' % i)[Y[i - 1]][Y[i]]))\n",
        "    P *= PIter\n",
        "    Sum += PIter\n",
        "print('非规范化概率', P)"
      ],
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "非规范化概率 24.532530197109345\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fW5RZz89NPsD",
        "colab_type": "text"
      },
      "source": [
        "#### Reference: https://nbviewer.jupyter.org/github/fengdu78/lihang-code/blob/master/%E7%AC%AC11%E7%AB%A0%20%E6%9D%A1%E4%BB%B6%E9%9A%8F%E6%9C%BA%E5%9C%BA/11.CRF.ipynb"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P6FeGVCCM7zC",
        "colab_type": "text"
      },
      "source": [
        "### 其实，我还是没搞懂CRF，没有在具体的项目中使用。PGM本身就是一个很大的topic，就这简简单单的一章无法全部解释。"
      ]
    }
  ]
}